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Abstract 

Background: There is interest in improving tlie flavor of commercial strawberry {Fragaria x ananassa) varieties. Fruit 
flavor is shaped by combinations of sugars, acids and volatile compounds. Many efforts seek to use genomics-based 
strategies to identify genes controlling flavor, and then designing durable molecular markers to follow these genes in 
breeding populations. In this report, fruit from two cultivars, varying for presence-absence of volatile compounds, along 
with segregating progeny, were analyzed using GC/MS and RNAseq. Expression data were bulked in silico according to 
presence/absence of a given volatile compound, in this case y-deca lactone, a compound conferring a peach flavor 
note to fruits. 

Results: Computationally sorting reads in segregating progeny based on y-decalactone presence eliminated transcripts 
not directly relevant to the volatile, revealing transcripts possibly imparting quantitative contributions. One candidate 
encodes an omega-6 fatty acid desaturase, an enzyme known to participate in lactone production in fungi, noted here 
as FoFADl. This candidate was induced by ripening, was detected in certain harvests, and correlated with y-decalactone 
presence. The FaFADl gene is present in every genotype where y-decalactone has been detected, and it was invariably 
missing in non-producers. A functional, PCR-based molecular marker was developed that cosegregates with the 
phenotype in Fi and BCi populations, as well as in many other cultivars and wild Fragaria accessions. 

Conclusions: Genetic, genomic and analytical chemistry techniques were combined to identify FaFADl, a gene 
likely controlling a key flavor volatile in strawberry. The same data may now be re-sorted based on presence/absence 
of any other volatile to identify other flavor-affecting candidates, leading to rapid generation of gene-specific markers. 
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Background 

The commercial strawberry {Fragaria x ananassa) {2n = 
8x = 56) is a popular fresh and processed fruit with sub- 
stantial value worldwide. It is recognized for its sweet 
flavors and appealing aromas. The volatile profiles of 
strawberry are relatively complicated among berries, 
with over 360 volatile compounds reported [1]. A reduced 
set of approximately 20 volatiles are commonly reported 
to be important components of strawberry flavor [2-4]. The 
principle flavor compounds include esters [5], ketones, 
terpenes [6], furanones [7], aldehydes [8], alcohols, and 
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sulfur-containing compounds. The concentrations of 
individual volatiles are highly dependent on species [9], 
environment and harvest date [2, 10-12], cultivar, post- 
harvest treatment and fruit developmental stage [13]. 

One important volatile compound is y-decalactone (y-D; 
CAS 706-14-9). This volatUe is described as "fruity", 
"sweet", or "peachy" [9] and contributes to fruit aroma 
[14,15]. The volatUe tends to be undetectable in some 
genotypes [2], whUe in others its accumulation varies 
greatly within and between harvest seasons [12]. This 
pattern suggests that a critical biosynthetic step or substrate 
may be missing or limited, and under strong environmental 
influence. The high variability may be due to differences 
in expression of genes encoding enzymes linked to the 
process. The observation that some genotypes never pro- 
duce the compound when others do presents an excellent 
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basis to use global transcriptome profiling to identify 
candidate genes associated with is production or stabil- 
ity. Because the commercial strawberry is octoploid, Fi 
progeny from a volatile producer and a non-producer 
have led to predictions about inheritance of a given 
volatile [2,16]. 

A number of researchers have used genomics approaches 
to identify marker-trait associations in polyploids using 
SNPs. Advances in marker discovery have been made in 
allohexaploid wheat {Triticum spp.) cultivars using the 
Illumina GoldenGate Assay [17], allohexaploid oat {Avena 
sativa L.) using Roche 454 sequencing [18], and allotetra- 
ploid oilseed rape {Brassica napus) using Illumina Solexa 
sequencing [19]. Other approaches have focused on devel- 
opmental changes in the transcriptome associated with 
ripening to identify gene candidates. This strategy was 
effective for grape {Vitis spp.) using Illumina sequencing 
[20], and recently in peach {Prunus persica L.) using mi- 
croarrays [21]. 

The goal of this work is to use a transcriptome-based 
approach to identify the genes required for y-D production. 
The approach leverages the presence/absence nature of 
y-D from specific genotypes, its predictable inheritance, 
environmental lability, and variation during the growing 
season. The analysis identified one transcript from a narrow 
set of gene candidates that is functionally related to genes 
implicated in biosynthesis of this compound in certain 
fungi [22-24] and the related compound y-dodecalactone 
[25]. A PCR-based amplicon corresponding to the candi- 
date sequence co-segregates with the volatile in a breeding 
population, corresponding backcrosses, and in select cultivars 
and wild accessions. We demonstrate that computational 
buUdng of RNAseq data based on the presence or absence of 
a volatile can identify transcripts likely playing a direct role 
in volatile production. 

Results 

The gene segregating with the presence of the y-D 
volatile has been shown to segregate as a single domin- 
ant locus, making it a prime candidate for the ap- 
proach outlined in Additional file 1: Figure SI. Briefly, 
a cross was constructed between Elyana, a y-D produ- 
cing cultivar, and Mara des Bois, a cultivar where y-D 
has not been detected. Progeny were grown, and fruits 
from each individual plant were analyzed for volatiles 
and coincident gene expression. The fruits from each 
plant were analyzed and sequenced separately so that 
transcriptomes from producers and non-producers could 
be bulked computationally, with the hypothesis that 
candidate genes would be common to producers, while 
being expressed low levels or go undetected in non- 
producers. Results could be experimentally validated in 
the parental lines and in segregating progeny using gene 
expression analysis. 



Y-D quantity is genetically and environmentally influenced 

The first tests examined y-D accumulation in the 'Elyana 
and 'Mara des Bois' parental lines and representative 
progeny over a growing season, using detection by GCMS. 
Genotypes were assayed for y-D production on three har- 
vest dates. The data are presented in Figure 1, showing 
data for a single genotype representing each of five general 
trends. Approximately 50% of the progeny produced no 
y-D, similar to 'Mara des Bois'. The largest portion of 
the y-D producers followed a similar trend to 'Elyana', 
with higher amounts in the second harvest compared to 
the first and third harvests. The reciprocal trend was 
observed in five genotypes that showed less y-D during 
the second harvest compared to the other two harvests. 
Three genotypes examined produced the highest amount 
in the first harvest, yet levels remained low the second 
and third harvests. A single genotype exhibited higher 
levels as the season progressed. These same volatile pat- 
terns were also observed in backcross progeny during 
the 2012/13 season (data not shown). While there is an 
approximately three-fold difference in accumulation in 
the 'Elyana' background over the season, no y-D was ever 
detected in 'Mara des Bois' above background noise. 

Y-Decalactone estimation 

The levels of y-D were estimated by comparing amounts 
detected in berries from the population using GCMS, with 
standards derived by adding the pure volatile to half-ripe 
strawberry fruit. Figure 2 shows the y-D volatile pheno- 
type for a subset of the 'Elyana x 'Mara des Bois' progeny. 
The top producing 30 genotypes from the 2012/13 season 
ranged from 0.018 to 0.035 mM y-D (data not shown). 

Transcriptome profiling 

Fourteen progeny and both parents were individually 
analyzed by RNA-seq. The y-D non-producers included 
in this analysis included 'Mara des Bois', 42, 89, 193, and 
203. The producers were 'Elyana', 6, 24, 37, 51, 91, 93, 
98, 103, 152, and 204. Many of the producers had higher 
y-D levels than the 'Elyana' y-D positive parent (Figure 2). 
The transcriptomes from individual lines were computa- 
tionally pooled based on "producer" or "non-producer". 
Over 106 million MID-tagged RNAseq reads were gener- 
ated from each of the parents and progeny. The average 
number of filtered and mapped reads per genotype was 
5.5 million per genotype and ranged from 3 million to 8.5 
million. Both parents had more than 7 million filtered, 
mapped reads. 

Approximately 17,000 out of -31,000 annotated genes 
in the strawberry draft genome [26] were represented in 
the RNA-seq dataset. A cursory SNP search identified 
over 1.7 million SNPs total when compared to the F, vesca 
genome (SNP criteria: 95% minimum P not ref, 10 or 
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— •— Not-Detectable ('Mara') (16) 




Harvests (H) Within a Season 

Figure 1 Representative genotypes showing y-decalactone stability over a single growing season. 35 out of 130 Elyana' x Mara des Bois' 
progeny produced fruit suitable for volatile analysis over three consecutive harvests. All progeny could be clustered into five classes based on 
y-decalactone production as shown by five uniquely shaped graphs below. 'Mara des Bois' represents the class of non-producers, with 'Elyana' 
representing those lines that peaked mid-season. Other patters included the inverse of 'Elyana' with a "valley" pattern, a strong "decrease" after 
the earliest harvest, and one progeny that showed an increase as the season progressed. Counts in each category are shown in the attached 
legend. Data are from two technical replicates from one example genotype per class. Error bars represent standard deviations. 

V J 



greater read depth, and present in at least two of the 
sixteen genotypes' datasets). 

Gene expression trends in parental lines 

Alignment of all reads against the diploid F. vesca genome 
produced transcript assemblies that provided cursory de- 
tail of gene expression difference between the two parental 
genotypes. Comparisons of differentially-expressed tran- 
scripts, including transcripts with low RPKM representation. 



between the 'Mara des Bois' and 'Elyana parents showed 
that among the 23,718 transcripts predicted from as- 
sembly of all reads, 2,153 were unique to the 'Mara des 
Bois' parent and 1,194 were only observed in 'Elyana 
(Figure 3, Panel A). When transcripts composed of 
higher RPKM values were compared there were 157 that 
had a 5-fold or greater abundance in 'Mara des Bois' 
and 71 that had a > 5 -fold abundance in 'Elyana'. When 
grouped by GO function the differentially expressed genes 




Figure 2 y-Decalactone production in a selection of progeny from the 'Elyana' x 'Mara des Bois' cross. Total volatiles were analyzed by 
GC/MS. A number of progeny produced more y-decalactone than 'Elyana' during the harvest shown. A subset of the progeny, along with the 
'Mara des Bois' parent, never produced y-decalactone above background levels. Ripe fruit samples from some of these genotypes were split 
between volatile analysis and RNA-seq transcriptome analysis. Data are from two technical replicates. Error bars represent standard deviations. 
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Figures Differential transcript accumulation in parental genotypes. A. Unique transcripts detected in eacli parent as well as 
those shared between lines. The number of transcripts detected >5 fold is shown for each parent. B. MapMan distribution of 
differentially expressed transcripts separated by GO terms. C. Transcript accumulation from genes associated with "lipid" annotation 
in 'Elyana' (E), 'Mara des Bois' (M) and segregating progeny. The minus sign (-) indicates the inability to detect y-decalactone in 
those lines. 



show no clear pattern (Figure 3B) favoring any one 
category. 

Table 1 shows twelve transcripts that were the most 
abundant in the y-D producing parent. The highest 
expressed was an omega-6-fatty acid desaturase transcript 
(gene24414), followed by a transcript annotated as osmo- 
tin stress/defense (gene32423). The set also includes 
two serine-threonine protein kinases (gene09445 and 
gene00774), citrate synthase (gene26778), an F-box pro- 
tein (genel2328), a proline transport protein (gene21705), 
and several uncharacterized, hypothetical proteins. 

Computational bulking to limit candidate set 

The large number of differentially-expressed transcripts 
could be further narrowed by analyzing transcript patterns 
for these genes in progeny segregating for y-D. Pairwise 
comparisons were made between genotypes with high 
(genotypes 'Elyana', 91, 24, 37, 103, and 006) and non- 
detectable (genotypes 'Mara des Bois', 203, 89, and 42) 



y-D levels. Gene candidates were filtered to have a mod- 
est >4-fold increase in transcript support of producers 
over non-producers. Using this approach, a single gene 
candidate was identified, gene24414 on linkage group 3 
(LG3:31112418..31114643, scf0513029:129621..131846), 
the same abundant transcript shown in Table 1 as variable 
between the two parents. To illustrate how the integration 
of segregating progeny can separate out transcripts not 
common to y-D volatile producers, all transcripts from 
the "lipids" category are shown in Figure 3C from the par- 
ental genotypes and several of their progeny. In general, 
lipid related transcripts show limited differential accumula- 
tion between any genotypes. The clear exception is the 
omega-6-fatty-acid desaturase (bottom row) which is not 
detected in 'Mara des Bois' (M) and in three of the progeny 
(green squares). The y-D volatile was not detected in these 
same genotypes. Of all candidates from Table 1, the omega- 
6-fatty-acid desaturase was the only transcript that corre- 
lated 100% with the ability to produce y-D. The gene was 
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Table 1 Transcripts abundant in gamma-decalactone-producing genotypes 



Featured ID 


Seq. Description 


GO biological process 


GO cellular component 


GO molecular function 


gene24414 


Omega-6 fatty acid 
endoplasmic reticulum 
isozyme 2-like 


Unsaturated fatty acid 
biosynthesis process; 
oxidation-reduction 
process; 


Endoplasmic reticulum 
membrane; integral to 
membrane; 


Deltal2-fatty acid dehydrogenase 
activity; oxidoreductase activity, 
acting on paired donors, with 
oxidation of a pair of donors 
resulting in the reduction of 
molecular oxygen to two 
molecules of water; omega-6 
fatty acid desaturase activity; 


gene32423 


Osmotin-like protein 
osm34 


Defense response to 
bacterium, incompatible 
interaction; response to 
salt stress; defense 
response to fungus, 
incompatible interaction; 


#N/A 


#N/A 


gene26993 


NA 


#N/A 


#N/A 


#N/A 


gene09445 


Serine threonine-protein 
kinase riol-like 


Phosphorylation; 


#N/A 


ATP binding; protein serine/ 
threonine kinase activity; 


gene26778 


Citrate synthase 


Tricarboxylic acid cycle; 
cellular carbohydrate 
metabolic process; 
response to cadmium 
ion; 


Cell wall; mitochondrial 
matrix; chloroplast; 


Zinc ion binding; citrate 
(Si)-synthase activity; ATP 
binding; 


gene 14440 


NA 


#N/A 


#N/A 


#N/A 


genel 31 32 


rKLUlLltU: uncharacterized 
protein in LOCI 01 3001 03 


#N/A 


#N/A 


#N/A 


geneujyz4 


atp-dependent-nad h-hydrate 
dehydrate-like 


ffl\l/A 


#M /A 
ffl\l/A 


#M / A 
ffl\l/A 


genel 2328 


f-box protein at5g07610-like 


#N/A 


#N/A 


#N/A 


gene21705 


Diphthamide biosynthesis 
protein 2-like 


Proline transport; 


#N/A 


#N/A 


gene00774 


Serine threonine protein 
phosphatase 


#N/A 


#N/A 


Hydrolase activity; 


gene23527 


Uncharacterized 
loci 01 207862 


Generation of precursor 
metabolites and energy; 


Chloroplast; membrane; 
mitochondrion; 


#N/A 



Transcripts here represent those that are correlate best with accumulation of gamma-decalactone (Pearson's p < 0.001). 



given the designation FaFADl. A 1,128 bp open reading 
frame was cloned from 'Elyana cDNA (Additional file 2). 

Validation of key candidates 

The steady-state transcript accumulation of FaFADl 
(Figure 4A), gene22642 (FaFAD2; Figure 4B), and 
gene29958 (a cytochrome p450 oxidase termed FaCYTp450; 
Figure 4C) was tested. The results for FaFAD2 and 
FaCYTp4S0 are not consistent with the ability to produce 
y-D, and were therefore de-prioritized as candidates. The 
qRT-PCR results for FaFADl visually correlated more 
closely with the volatile phenotype than the RNA-seq 
RPKM values. Both methods were similar in failure to de- 
tect transcript support for FaFADl in y-D non-producers. 

Candidate genes In fruit developmental series 

An 'Elyana fruit series was tested for ripening induction of 
FaFADl, FaFAD2, and FaCYTp450 as shown in Figure 5. 
The fold-change in transcript abundance for each gene 



is shown for ripe fruit compared to blushing fruit. 
FaCYTp450 showed a > 11-fold increase (+/- 0.30) in 
transcript abundance between ripe and blushing fruit. 
FaFADl showed a >21-fold increase (+/- 0.33) in tran- 
script abundance. FaFAD2 was not ripening induced. 
These results were consistent over at least three inde- 
pendent biological replicates. 

Candidate genes and two environments 

Environmental fluctuation of y-D accumulation is shown 
in Figure 2. To test if FaFADl matched this pattern, 
transcript abundance was examined in tissues obtained 
from two different harvests. The population average for 
all y-D producers in environment "a" was approximately 
eleven-fold less than the population average for y-D in 
environment "b". y-D non-producers had levels of y-D 
only consistent with noise during either harvest, but y-D 
producers showed high environmental effects. Figure 6A 
shows the y-D accumulation for four genotypes (10, 93, 
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Y-D 

Non-Producers 



Y-D 

Producers 



Figure 4 qRT-PCR results from three gene candidates correlated with the y-decalactone phenotype. A single gene, FoFADl (gene24414), 
(A) encoding a putative cjo-6-fatty acid desaturase was identified as differentially expressed between high and low y-decalactone genotypes. 
Another putative fatty acid desaturase, FoFad2 (gene22642), was found in the F. vesca genome and is shown in (B). Reducing the stringency by 
reducing the number of progeny in each phenotypic pool resulted in another candidate (C) CYTp450 (gene29958), a putative cytochrome p450 
monoxgenase, located in proximity to FoFADl. qRT-PCR results are shown for each of these genes using 'Elyana' as the comparator against a 
subset of progeny. Data are from three technical replicates with error bars representing standard deviations. 



103, and 'Elyana) in these two environments. Figure 6B 
shows the qRT-PCR results for these genotypes in the two 
environments when transcript abundance in environment 
"b" was compared to environment "a". Only modest in- 
creases are shown for genotypes 93, 103, and 'Elyana 
with genotype 10 (non-producer) showing no evidence 
for FaFADl transcript accumulation. 



Molecular basis of y-decalactone loss-of-function 

The lack of detectable transcript in the 'Mara des Bois' 
parent and in specific progeny may be due to at least one 
of two factors. First, that transcription/mRNA accumula- 
tion of the candidate FaFADl is blocked. Alternatively the 
functional gene or allele may be missing altogether. To 
test these possibilities several primer pairs were designed 
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FaFADl 



FaFAD2 CYTp450 



Figure 5 qRT results for v-decalactone gene candidates tested 
against an 'Elyana' developmental fruit series. Three stages of 
fruit were tested (green expanding, blushing, and full red, ripe). The 
Y-decalactone phenotype is only detectable in fully ripe fruit. Only 
comparisons between blushing and ripe developmental stages are 
shown. The FaFADl and CYTp450 genes show -21 -fold and ~1 1-fold 
(respectively) ripening induction in 'Elyana'. Data are from three technical 
replicates with error bars representing standard deviations. 



to amplify the genomic sequence upstream and internal 
to FaFADl. A map of the genomic region and the corre- 
sponding primer pairs is provided in Figure 7. In all 
cases, none of the primer pairs could amplif)^ products 
from genotypes unable to produce y-D, while ampUcons 
across the region were produced from every plant where 
y-D was detected. 

Y-decalactone marker in three populations 

The ability to amplify a product specifically in y-D pro- 
ducing genotypes provided an opportunity to design a 
gene-based molecular marker. A y-D PCR-based assay 
was designed using FaFADl primers and then tested 
against three populations. The first was a subset of 19 
genotypes from the original 'Elyana x 'Mara des Bois' 
Fi population (Figure 8A). The presence of the 500 bp 
PGR amplicon (solid arrow) was detected exclusively 
in genotypes shown to produce y-D. The positive PGR 
control (BFAGT045) is shown by the dashed line arrow. 
The second and third populations tested were BGi crosses 
to the 'Elyana and 'Mara des Bois' parents with progeny 
98 as the male in each case. The 'Elyana backcross con- 
tained 26 progeny, each with at least three harvests during 
the 2012/13 growing season. The marker co-segregated 
with the phenotype in all cases. Twenty-two progeny in 
the 'Mara des Bois' backcross had at least three volatile 
harvests during the 2012/13 season. Each cosegregated 
with the marker and phenotype (data not shown). 

The potential molecular marker was also tested against 
a set of cultivars with demonstrated present or undetectable 
y-D. 'Radiance; 'Albion,' 'Winter Star,' and 'Sweet Gharlie' 
were all y-D positive and positive also for the PGR product 
(Figure 8G). 'Deutsch EvernJ 'Strawberry Festival; 'LF9; and 
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Figure 6 Presence of y-decalactone in non-inductive and 
inductive environments, and their correlation with FaFADl 
transcript accumulation. Comparison of y-decalcatone detected 
from environment 'a' (non-inductive) compared to environment 'b' 
(inductive) (A). Genotypes wliere FADl transcript is detected produce 
tine volatile only in inductive environment 'b'. Others tested (e.g. line 
10) did not produce detectable y-decalactone in either environment. 
(B) The relative FaFADl transcript accumulation for genotypes shown 
in (A). 



'Mieze Schindler' were all negative for y-D and also 
negative also for the marker. 'Strawberry Festival' and 
'LF9' were additionally interesting because 'LF9' is a 
seedling from self-pollination of 'Strawberry Festival' 
[27]. The prediction would therefore be that 'LF9' would 
be negative for the marker and y-D phenotype. This was 
confirmed by volatile and marker analysis. 

SSR marker development 

An SSR marker was developed to investigate cosegregation 
of alleles more distantly positioned relative to FaFADl, 
Figure 9 shows the results of the SSR tested in the parents 
'Elyana and 'Mara des Bois; and 15 progeny selected from 
both y-D producers and non-producers. Few progeny were 
tested because the objective of the SSR marker design 
was simply to demonstrate the potential for converting 
a gene candidate into a second type of molecular marker 
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commonly used for Fragaria genotyping. 'Elyana exhib- 
ited four marker alleles (205, 209, 215, and 219), and 
'Mara des Bois' only possesses the 209 allele. For clarity, 
only the 205 and 209 alleles are shown. Allele 209 was 
monomorphic in all genotypes tested, and alleles 215 and 
219 were not associated with the y-D phenotype. Progeny 
103, 152, 204, 24, 37, 51, 6, 91, 93, and 98 were all positive 
the y-D phenotype and for allele 205. Progeny 171, 193, 
203, 42, and 89 were negative for y-D and the 205 allele. 



Discussion 

Fruit flavor and aroma profiles are shaped by a mixture 
of sugars, acids and volatile organic compounds. Individual 
volatile components have been demonstrated to play 
key roles in consumer liking, as noted in tomato [28], 
and strawberry [29]. Several reports have detailed the 
importance of key compounds to aroma production and 
the genetic loci or genes that control them [6,16]. One 
component contributing to flavor in strawberry fruits is 
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Figure 8 Co-segregation of the FaFADl, PCR-based marker with the y-decalactone phenotype. The FaFADl-based PGR product is denoted 
by the single arrow and migrates at 500 bp by design. The dashed arrow denotes a positive PGR control (BFAGT045) that is located in proximity 
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seven F virginiana accessions (number = GRIN PI accession). (C) The correlation of the PGR product and y-decalactone phenotype in cultivars 
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y-decalactone (y-D). This compound represents the recog- 
nized aromas in peaches and apricots, and is of interest to 
industry as a flavoring agent. The synthesis of y-D is not 
well-understood in plants. However, several microorgan- 
isms can generate y-D from fatty-acid substrates [30] and 
are used as bioreactors to produce this flavor compound. 

To identif)^ transcripts related to y-D production, two 
parents varying for the volatile were crossed. 'Elyana is a 
large, firm berry, and bred for production in Florida 
[31]. 'Mara des Bois' is smaller and soft, and not used in 
wide commercial cultivation. Fruits (red, ripe fruits of similar 
size, age, and environment) were assayed for the volatile and 
also for transcripts associated with receptacle/achene ripen- 
ing. The RNAseq data from producers and non-producers 
were computationally bulked to identify sequences common 



to each pool (Additional file 1: Figure SI). Strawberry fruit 
transcriptomes have been examined previously [32-34], 
and show changes in a substantial number of transcripts 
throughout the ripening process. The commercial straw- 
berry is octoploid and highly heterozygous [35]. Crossing 
two plants produces a myriad of progeny phenotypes due 
to contributions from homoeologous genes. The segrega- 
tion within subgenomes lil<ely provides additional "noise" 
that allows for transcripts relevant to the trait to separ- 
ate clearly from others, focusing the candidate set. 
Transcriptomes for each plant were considered independ- 
ently, so the transcriptomes of y-D producers and non- 
producers could be compared in silico. 

The test began with detection of y-D. The y-D pro- 
ducers show fluctuations in volatile quantity throughout 
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the growing season (Figure 1). The genotypes with the 
highest concentrations were estimated to possess between 
0.028 to 0.035 mM y-D (data not shown). Gamma deca- 
lactone accumulation was not ever detected in the 'Mara 
des Bois' parent. The range of differences observed in 
parental fruits and progeny presented an ideal situation 
to assay global gene expression coinciding with volatile 
production. 

When progeny were sorted by presence/absence of y-D, 
a small subset of transcripts was significantly different 
between phenotypic groups. Pairwise comparisons of 
RNAseq data around a fruit phenotype resolved to an 
especially strong single candidate, a transcript encoding 
an omega-6-fatty acid desaturase {FaFADl), This gene is 
located near a QTL previously reported to explain about 
90% of the y-D phenotype in strawberry [16]. Another 
gene recently found in peach {Pp¥AD_lB-6, 71% identity 
and 82% similar to Fa¥ADl at the protein level) was cor- 
related with y-D in ripening fruit, and demonstrated (o-6 
oleate desaturase activity when overexpressed in yeast 
[21]. Fatty acid desaturases catalyze the formation of 
double bonds into fatty acyl chains. The report from 
Sanchez [36], and the biochemistry inferred from protein 
sequence, each suggest a role for this gene in lactone pro- 
duction, but the precise biochemical steps have not been 
demonstrated. Future work will examine the role of this 
transcript in transgenic lines. 

FaFADl transcript abundance correlated well with the 
presence of y-D. The transcript accumulated with ripening, 
and only in certain environmental conditions (Figure 5). 
The ability to produce y-D segregated as a single dominant 
locus, consistent with previous reports [16]. A PCR-based 
survey of materials from the population indicated that the 
FaFADl genomic sequence could not be amplified from 
y-D non-producers, suggesting gene deletion or radical al- 
teration affecting PGR amplification. Eight total combina- 
tions using ten distinctly positioned primers were used to 
amplif)^ regions upstream of and within the FaFADl gene 
(Shown in Figure 7). In each case, 'Elyana genomic tem- 
plate amplified the expected fragments, but 'Mara des 
Bois' produced no amplicons. This is consistent with 
the idea that a deletion is responsible for the absence of 
FaFADl -related sequence in the 'Mara des Bois' genome, 
and this would explain the dominant, single gene effect 
for the y-D phenotype. This finding allowed for develop- 
ment of a PGR-based molecular marker to identify ge- 
notypes with the potential to produce y-D. PGR primers 
corresponding to the 5 ' sequence of FaFADl were used 
to amplify the region shown in Figure 7 (primers A and B). 
The presence of the PGR product co-segregated precisely 
with the ability to produce the y-D, in the parental cross, 
in progeny, and in backcross populations (Figure 8A). 

The nature of the deletion is further exemplified using 
simple sequence repeat (SSR) markers, tools frequently 



used to fingerprint strawberry germplasm [37-39]. The 
availability of the diploid strawberry, F. vesca, genome 
[26] makes it possible to position the microsatellite se- 
quences within the structural context of the strawberry 
genome. One SSR sequence is located 11 kb adjacent to 
the FaFADl gene, and polymorphisms would be predicted 
to segregate with the candidate gene. The results in 
Figure 8 show that the presence of allele 205 is a reliable 
predictor of the ability to produce y-D, at least in the sub- 
set of the population evaluated. These data are important 
because they provide two independent, PGR-based tests 
can differentiate y-D producers and non-producers. While 
not tested outside of the parental genotypes and only in 
several progeny, this second primer set offers another po- 
tential molecular marker that may be used to verify results 
from the FaFADl sequence. The findings also indicate 
that the SSR is present in the 'Elyana parent and not the 
other, suggesting that the missing FaFADl gene may be 
part of a larger deletion. 

The potential utility of this molecular marker was dem- 
onstrated when it was applied to a set of unrelated germ- 
plasm that was also varying for presence/absence of y-D. 
In this case, the PGR product could only be amplified in 
genotypes demonstrated to produce y-D. The evaluation 
was extended to wild germplasm where fruit could be 
obtained. Even in these distant accessions, the PGR prod- 
uct was only amplified in lines where y-D was detectable. 
The putative marker not only works within a population, 
but likely will work across commercial breeding popula- 
tions and wild octoploid accessions. Future studies will 
track the origin of the gene in diploid germplasm in an 
attempt to reconstruct its origins, as was done with alleles 
to trace linalool-producing variants of FaNESl [6,40]. 

There are notable limitations to the approach used in 
this study. The visual correlation between the y-D pheno- 
type and qRT-PGR results is shown in Figures 4 and 5. 
This quantitative trend, however, is not consistent in the 
RNA-seq RPKM data (data not shown). y-D producers, 
irrespective of volatile amount produced, could not be 
distinguished quantitatively, though producers and non- 
producers were still easily discernible. This suggests that 
while the RNA-seq approach worked well for the qualita- 
tive y-D phenotype, uncovering quantitative y-D effects 
would be very challenging. This further underscores the 
need to incorporate independent methods (like qRT-PGR) 
for examining candidate genes in the early stages of 
discovery. 

The development of two independent molecular markers 
that segregate with the phenotype in a wide range of 
germplam will permit improved parental selection and 
rapid screening of progeny possessing the ability to 
produce y-D. A simple and robust assay can rapidly elim- 
inate individual plants from breeding populations, saving 
time, fuel, space and other resources. Most importantly. 
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the marker allows more rapid integration of a fruity 
volatile into advanced selections. 

Conclusions 

The results of this study demonstrate that gene candidates 
for strawberry fruit traits may be identified by integrating 
careful phenotyping and transcriptomic analysis with 
genetics. This approach rapidly reduced the complexity 
of the octoploid transcriptome down to a single candi- 
date gene. The identified gene, FaFADl, is functionally 
equivalent to genes involved in y-D synthesis in fungi 
and potentially in peach [21]. Gene identification led to the 
development of a gene-based marker, enabling selection for 
y-D producers at the seedling stage. Moving rapidly from 
candidate to marker has the potential to increase breeding 
efficiency and reduce the downstream costs associated with 
maintaining plants lacking a favorable trait. Looking 
forward, y-D is only one of many volatiles that can be 
analyzed with this approach. The same exact dataset 
may now be re-sorted to identify candidates for other 
volatiles. The bulk sorting of polyploid transcriptomes 
is a rapid and cost effective means to identify a testable 
suite of genes contributing to a given trait. 

Methods 

Plant materials 

Parental lines were Fragaria x ananassa 'Elyana female 
(y-D producer) and 'Mara des Bois' male (y-D non- 
producer). Fi progeny from the 'Elyana x 'Mara des Bois' 
cross were clonally multiplied in a Colorado summer nur- 
sery in 2010. Two runner tips of each of approximately 
130 seedlings were harvested from the nursery and grown 
at the Gulf Coast Research and Education Center 
(GCREC) in Wimauma, Florida during the 2010-11 
winter season. These plants were evaluated in the field 
for horticultural qualities such as yield and fruit size 
and for volatile diversity using GC/MS. A subset of this 
population was selected based on flavor volatile diversity 
and superior plant performance. These selections were fur- 
ther clonally multiplied and grown again in the 2011/12 
and 2012/13 seasons. 

BCi populations were made by crossing one y-D posi- 
tive progeny (progeny 98, male) to each parent (female) in 
2011. Backcross progeny were increased in the summer 
nursery as described above. During the 2012/13 season, 
the progeny were transplanted to the same commercial 
strawberry growing system in Florida and analyzed for 
volatiles and other horticultural traits. Other cultivars for 
marker validation were maintained at GCREC. All other 
genotypes used in this study were obtained from the 
Germplasm Resource and Information Network (GRIN) 
repository in Corvallis, OR, and maintained in the field 
at GCREC. 



Fruit volatile analysis 

All fruiting progeny from the 'Elyana x 'Mara des Bois' 
cross were analyzed for volatiles by GC/MS during the 
2010/11 season. Harvest dates were January 20, February 
11, February 25, and March 18, 2011. Backcross popula- 
tions were harvested during the 2012/13 season and were 
harvested on January 13, January 31, and March 7, 2013. 
Data from the 2010/11 harvests were used to select geno- 
types segregating for volatiles of interest. Fruit processing 
for volatile analysis was conducted as follows. A represen- 
tative -25 g sample was collected from five to six fully 
ripe, clean, and normal-shaped berries from each geno- 
type. The calyx from each berry was removed, and ber- 
ries were blended with an equal weight of saturated 
NaCl solution (-35% NaCl in molecular biology grade 
water). The volatile 3-hexanone was added as an in- 
ternal standard to a final concentration of 1 ppm prior 
to blending. Five ml aliquots were dispensed into 20 ml 
glass vials and sealed with magnetic crimp caps (Gerstel, 
Baltimore, MD, USA). Two technical repUcates were 
processed for each genotype at each harvest. Samples 
were frozen at -20°C until analysis by Gas Chromatog- 
raphy/Mass Spectroscopy. 

Gas chromatography/mass spectroscopy (GC/MS) 

A 2 cm tri-phase SPME fiber (50/30 (im DVB/Car- 
boxen/PDMS, Supelco, Bellefonte, PA, USA) was used 
to collect and concentrate volatiles prior to running on 
an Agilent 6890 GC coupled with a 5973 N MS de- 
tector (Agilent Technologies, Palo Alto, CA, USA). Be- 
fore analysis, samples were held at 4°C in a Peltier 
cooling tray attached to a MPS2 autosampler (Gerstel). 
All other volatile sampling and analysis methods were 
as previously described [15]. The volatile 3-hexanone 
was used as an internal control. An authentic y-D stand- 
ard (Sigma Aldrich, St. Louis, MO, USA) was run under 
the same chromatographic conditions as berry samples 
for verification of volatile identify. The area of each y-D 
peak was normalized to the peak area of the internal 
standard, and normalized peak areas were compared be- 
tween samples. 

Estimation of ydecalactone in strawberry fruit 

A standard curve was made in half ripe fruit of F, x ana- 
nassa 'Winterstar' fruit puree to estimate the amount of 
y-D. This approach mimics volatile detection in ripe 
fruit. Half red fruit were processed with saturated NaCl 
and 3-hexanone as described in Fruit Volatile Analysis 
above. Pure y-D (Sigma; St. Louis, MO) was added to 
puree aliquots from 0.005 to 0.3 mM concentrations, 
vortexed thoroughly, and then analyzed by GCMS as 
with all other samples. BaseUne y-D was identified in 
puree sampled without pure volatile added. 
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Combined volatile and RNA-seq tissue 

Fruit for volatile and RNA-seq analyses were harvested 
on December 15, 2011. Fourteen progeny and both 
parents ('Elyana and 'Mara des Bois') were selected to 
maximize representation of segregating volatiles. Eight 
to ten fully-ripe fruit were collected from each genotype, 
cleaned, the calyx was removed, and then split longitudin- 
ally. Half of each progeny's sample was processed for vola- 
tile analysis as described above, and the other half was 
flash frozen in liquid nitrogen and stored at -80°C until 
RNA extraction. A blank was processed in between each 
GC/MS sample to minimize cross-sample contamination. 

RNA extraction 

Frozen berries were crushed and then ground to a 
fine powder in a liquid nitrogen-cooled coffee grinder 
(KitchenAid Blade Coffee Grinder, St. Joseph, MI, USA). 
RNA was extracted using a modified method [41]. Two 
grams of fruit powder was used per 5 ml extraction buffer. 

RNA was treated with DNase I, RNase-free (Fermentas, 
Waltham, MA, USA) according to the manufacturer in- 
structions and then cleaned using the Qiagen RNeasy 
Mini Kit (Qiagen, Valencia, CA, USA). RNA quality was 
checked on a Bioanalyzer prior to RNA-seq library con- 
struction. Each sample was individually barcoded during 
library construction. Two lanes with eight libraries each 
were run on an lUumina Genome Analyzer IIx (Illumina, 
San Diego, CA, USA). 

Transcriptome analysis 

Reads that passed quality checks were aligned to the F. 
vesca genome using either a custom script or SeqMan 
NGen (DNASTAR version 2.3, Lasergene, Madison, WI, 
USA). F, vesca Genemark Hybrid version 1.1 was used 
for gene annotations. Gene candidates were identified in 
QSeq (DNASTAR) by making pairwise comparisons be- 
tween high and low y-D genotypes. Each pairwise compari- 
son excluded genes less than 2 to 4-fold abundance 
when comparing high versus low y-D genotypes. The 
genes remaining after these comparisons became gene 
candidates and were analyzed further by qRT-PCR. 

A separate analysis was made using CLC Genomic 
Workbench software v6.5.0 (CLC Bio, Cambridge, MA, 
USA). The alignment parameters were adjusted to allow 
for expected variations in the octoploid genome relative to 
the diploid one (Minimum length fraction = 0.6; Minimum 
similarity fraction = 0.5; Maximum number of hits for 
a read = 5). RPKM value was used to normalize the ex- 
pression data sets among individuals in the population. 
The data from Figure 3A overexpression data represent 
transcripts with a RPKM > 10 in at least one of the culti- 
vars. A transcript was considered overexpressed when it 
was present at least 5 times more frequently in one geno- 
type than the other. The functional classification of the 



differentially expressed genes was performed using Map- 
Man ontology. The heatmap used for parents comparison 
was designed using Mapman software (Mapman version 
3.6.0RC1,). A dot represents the log2 of the RPKM ratio 
of a transcript between the 'Mara des Bois' and 'Elyana 
parents. 

qRT-PCR 

cDNA templates for qRT-PCR were synthesized using 
the Impromtu II Reverse Transcriptase kit (Promega, 
Madison, WI, USA) according to the manufacturers 
protocol. The cDNA was diluted 1:10 prior to qRT-PCR 
analysis. All qRT-PCR reactions were run in 20 ul reac- 
tions using EvaGreen qPCR Mastermix-ROX (Applied 
Biological Materials Inc., Richmond, BC, Canada). Each 
reaction contained 10 ul 2x EvaGreen mastermix, 2 (il 
primer mix (2 uM each), 1 \i\ 1:10 diluted cDNA, and 
7 (il DNase/RNase free water. All qRT PCR primers were 
designed using qRT primer design tools available online 
(idtdna.com), and designed to amplify fragments between 
95 and 110 base pairs. Each primer- template combination 
was run with three technical replicates and three biological 
replicates. A conserved hypothetical protein {FaCHPl [42] 
was used as a housekeeping control (5' TGCATATATC 
AAGCAACTTTACACTGA 3' forward and 5' ATAGC 
TGAGATGGATCTTCCTGTGA 3' reverse). The qRT 
PCR was run on an Applied Biosystems StepOnePlus 
Real-Time PCR System using StepOne Software (v2.0) 
(Applied Biosystems, Foster City, CA, USA). The qRT-PCR 
data was analyzed using the comparative Ct method 
(AACt) following the manufacturer s direction. 

Candidate genes from the RNA-seq results were validated 
by qRT-PCR against templates from multiple genotypes, 
developmental stages, and/or environments. Candidates 
were initially validated using cDNA templates from all 
16 genotypes included in the RNA-seq dataset. Further, 
candidates were tested for induction during ripening in 
'Elyana (only ripe fruits have detectable y-D), and two 
environments with low or high y-D production. The qRT- 
PCR sequences for FaFADl (gene24414) were forward 
5' GTGCCCTTACTGATAACAAACG 3' and reverse 
5' TCGCAACCAATCCCACTC 3', for CYTp4S0 (gene 
29958) forward 5' ACCCAAAGGTCTATCACATGAC 
3' and reverse 5' TGAGCTTCAGTTCCTAACCAC 3', 
and for FaFAD2 (gene22642) forward 5' AACTGGTGTC 
TGGGTCATTG 3' and reverse 5' GAAAGGAGTGAAG 
GATCAGGC3'. 

Cloning full length transcript for FaFADl 

The full length transcript for gene24414 was cloned using 
primer sequences guided by the transcriptome data. 
Primers were designed to include attb sites for Gateway 
(Invitrogen) cloning into pDONR222. Forward primer 5' 
AAAAAGCAGGCTGCATGGGAGCCGATACCAAGTT 
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CGAAGAG 3' and a poly T reverse primer 5' AGAA 
AGCTGGGTGTTTTTTTTTTTTTTTTTTTTTTTTTT 
TTTTTTTTTTT 3' were initially used to obtain the 
full length transcript including 3' UTR. A second reverse 
primer was designed from the cloned sequence up to and 
including the stop codon from the open reading frame (re- 
verse 5' AGAAAGCTGGGTGTTAGTTCCGGTACCAAA 
AAACACCTTTGGT 3 ). A second round of PGR recon- 
structed full length attb sites using forward 5 ' GGGGACA 
AGTTTGTACAAAAAAGCAGGCT 3' and reverse 5' 
GGGGACCACTTTGTACAAGAAAGCTGGGT 3' primers. 
Standard PGR conditions were used with GoTaq poly- 
merase (Promega) according to the manufacturer s rec- 
ommendations. The full length sequence, predicted 
protein translation, and physical map coordinates are 
provided in Additional file 2. 

Designing a functional molecular marker 

A molecular marker for y-D production was designed 
using the F. vesca genome sequence [26]. Primers were 
designed to amplify a 500 bp fragment from the 5 ' end 
of gene24414 into the 5' UTR: (forward) 5' CGGGK 
TTAATGGTTTTGTTGTTGACCGAGG 3' and (reverse) 
5' GTAGAAGAGAGAGAGCAAGAGGAG 3'. BFACT 
045 primers were previously shown to be linked to y-D 
production and these primers were used as PGR controls 
forward 5' GGAGAAATGTAGTTGGTAGTGTTGTGA 
3' and reverse 5' GAGGGAGAAGTGTTTTTGGTG 3' 
[43]. The BFAGT045 primers did not produce alleles that 
segregated with the y-D phenotype in our populations as 
tested by capillary electrophoresis (data not shown). All 
PGR reactions (12.5 \A total volume) were run using 
GoTaq Hot Start polymerase (Promega) according to 
the manufacturers instructions. All thermocycler condi- 
tions were as follows: 94°G 4 min, followed by 25 cycles of 
94°G 30 s, 56°G 30 s, 72°G 30 s, and with a final extension 
for 10 min at 72°. 

The putative marker for gene24414 was tested against 
a subset of 17 progeny from the 'Elyana x 'Mara des Bois' 
cross including parents, and two backcross populations to 
test segregation with the y-D phenotype. Progeny were 
included in this analysis if they had three or more volatile 
samplings during the season to ensure accurate phenotyp- 
ing of y-D non-producers. Backcross population 'Elyana x 
progeny 98 produced 26 progeny suitable for analysis, and 
backcross population 'Mara des Bois' x progeny 98 had 22 
progeny suitable for analysis. Other octoploids suitable for 
consideration as breeding parents and/or for this study 
were also tested for marker cosegregation with the y-D 
phenotype. These genotypes included 'Radiance^ 'Deutsch 
Evern', 'Festival', 'LF9' (a self-pollination of 'Festival'), 
'Albion; 'Winterstar'™ ('FL 05-107'), 'Mieze Schindler', 
'Sweet Gharlie', and 'Winter Dawn'. 



A selection of wild, octoploid genotypes were also tested 
for the presence of the gene24414 marker, but were only 
included in this study if a minimum of three volatile 
samplings were performed. These genotypes are listed 
here by their Germplasm Resources Information Network 
Plant Inventory (PI) number and include accessions 
236579, 612323, 612495, 612498, and 612499. 

Statistical marker analysis 

Ghi- square analysis was performed on marker and volatile 
data for all genotypes with at least three separate harvests 
during a season. Due to a strong environmental compo- 
nent, three harvests were necessary to provide the strongest 
evidence for accurately identifying a y-D non-producer. 

SSR marker design 

A 50 kb region of the genome surrounding gene24414 
was downloaded using the Strawberry Genome Browser 
at the Genome Database for Rosaceae (rosaceae.org). 
The web version of BatchPrimer3 [44] was used to search 
for Simple Sequence Repeats (SSRs) near gene24414. 
Primers were designed to flank a dinucleotide repeat 
approximately 11 kb from gene24414. The primer pair 
(forward 5' TGTAAAACGACGGCGAGTGAAGAAG 
ATGACAGTAGGGACGAGGAAG 3' and reverse 5' 
GTTGTATGTGAGAAGATGGGAAGAAAGATGAC 3 ' 
with fluorescently labeled 5' 6FAM-TGTAAAACGAC 
GGCCAGT 3') exhibited variation between 'Elyana' and 
'Mara des Bois' and segregation in the fifteen progeny 
tested. These primers were used for allele detection during 
capillary electrophoresis as previously described [44]. 

Availability of supporting data 

The RNAseq reads have been deposited into the NGBI Short 
Read Archive and are accessible under project SRP039356 
(http://www.ncbi.nlm.nih.gov/sra/?term=SRP039356). Data 
represent all reads from parental lines ('Mara des Bois' 
and 'Elyana') as well as reads from individual Fl plants. 
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Additional file 1: Figure SI. An overview of the process used to 
ider^tify candidate genes through analysis of bulk segregation of 
transcripts corresponding to a flavor volatile. 

Additional file 2: The DNA and protein sequences of FAD1. 
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